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Abstract 

In this paper, reasons explaining why the CM model of Bahi and 
Michel (2008) simulates with a good accuracy genes mutations over time 
are proposed. It is firstly justified that the CM model is a chaotic one, as 
it is defined by Devaney. Then, it is established that inversions occurring 
in genes mutations have indeed a chaotic dynamic, thus making relevant 
the use of chaotic models for genes evolution. 

keywords: Genes evolution models; Inversions; Mathematical topology; De- 
vaney's chaos. 



1 Introduction 

Codons are not uniformly distributed into the genome. Over time mutations 
have introduced some variations in their apparition frequency. It can be at- 
tractive to study the genetic patterns (blocs of more than one nucleotide: din- 
ucleotides, trinucleotides...) that appear and disappear depending on mutation 
parameters. Mathematical models allow the prediction of such an evolution, in 
such a way that statistical values observed into current genomes can be recov- 
ered. 

A first model for genomes evolution has been proposed in 1969 by Thomas 
Jukes and Charles Cantor [TB]. This first attempt has been followed up by Mo- 
too Kimura [T7], Joseph Felsenstein P2|, Masami Hasegawa, Hirohisa Kishino, 
and Taka-Aki Yano [15] respectively. The differences between these models are 
in the number of parameters they use, but all of these models manipulate con- 
stant parameters. However, they are rudimentary as they only allow to study 
nucleotides evolution, not genetic patterns mutations. 

From 1990 to 1994, Didier Arques and Christian Michel have proposed 
models based on the RY purine/pyrimidine alphabet [H [3J [3J [H Q] . These 
models have been abandoned by their own authors in favor of models over 
the {A, C, G, T} alphabet. More precisely, Didier Arques, Jean-Paul Fallot, 
and Christian Michel have proposed in [2] a first evolutionary model on the 
{^4, C, G, T} alphabet that is based on trinucleotides. As for the nucleotides 
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based models, this new approach has taken into account only constants param- 
eters. 

In 2004, Jacques M. Bahi and Christian Michel have published a novel re- 
search work in which the model of 1998 has been improved by replacing constants 
parameters by new parameters dependent on time [S]. By this way, it has been 
possible to simulate a genes evolution that is non-linear. However, the following 
years, these researchers have been returned to models embedding constant pa- 
rameters, probably due to the fact that the model of 2004 lead to poor results: 
only one of the twelve studied cases allows to recover values that are close to 
reality. For instance, in 2006, Gabriel Frey and Christian Michel have proposed 
a model that uses 6 constant parameters Q3], whereas in 2007, Christian Michel 
has constructed a model with 9 constants parameters that generalize those of 
1998 and 2006 [18]. Finally, Jacques M. Bahi and Christian Michel have re- 
cently introduced in |5J[TD], a last model with 3 constant parameters, but whose 
evolution matrix evolves over time. In other words, trinucleotides that have to 
mutate are not fixed, but they are randomly picked among a subset of poten- 
tially mutable trinucleotides. 

This model, called "chaotic model" CM, allow a good recovery of various 
statistical properties detected into the genome. Furthermore, this model match 
well with the hypothesis of some primitive genes that have mutated over time. 
In this paper, we wonder why the CM model gives good results. Obviously, to 
suppose that not all of the trinucleotides have to mutate at each time is reason- 
able as, for instance, the stop codons have very small mutation probabilities. 
However, such a biological claim is not sufficient to explain all the consequences 
of the success of the CM model to simulate the dynamics of mutations into 
genomes. Indeed, we have recently established that such a model based on 
chaotic iterations is indeed really chaotic, as it is defined in the mathematical 
theory of chaos. Before this proof, the term "chaotic" in these discrete iter- 
ations was only an adjective, having apparently no obvious relation with the 
well-established Devaney's characterization of an unpredictable behavior. 

In this paper, we wonder why a model having a chaotic dynamics gives, in a 
certain way, better results than the standard model, to predict the evolution of 
genomes through mutations. We will show that an important mutation mecha- 
nism, namely the inversion, has a chaotic dynamic over time. Consequences of 
this proof, for biology and evolution models, will finally be discussed. 

The remainder of this research work is organized as follows. 

2 Discrete time chaotic evolution model 
2.1 Chaotic Iterations 

Let us consider a system with a finite number N € IN* of elements (or cells), 
so that each cell has a Boolean state. A sequence of length N of Boolean states 
of the cells corresponds to a particular state of the system. A sequence which 
elements belong to [1; NJ is called a strategy. The set of all strategies is denoted 
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by S. 

Definition 1 The set B denoting {0, 1}, let f : B N — > B N be a function and 
S G 8 be a strategy. The so-called chaotic iterations are defined by x° 6 B N and 

v»€ir,vi€[i ; N],* f ={^ B _ 1))fli> i*:ti 

In other words, at the n th iteration, only the S n — th cell is "iterated". Note 
that in a more general formulation, S n can be a subset of components and 
(f(x n ~ 1 )) s „ can be replaced by (f(x k )) Sn , where A: < n, describing for example, 
delays transmission |19| . 



2.2 Genes mutations shown as chaotic iterations 

When considering the model of 2007 with 9 constant parameters that generalize 
models of 1998 and 2006, all of the trinucleotides have to mutate at each time. 
These models do not take into account the low mutability of the stop codons. 
Additionally, they do not allow to apply mutation strategies on certain given 
codons, while the other codons do not mutate. This is why a new model with 3 
constant parameters has been proposed in [5] [TO]. In this model, the set of trin- 
ucleotides is divided into two subsets at each time t: the first one is constituted 
by trinucleotides that can possibly mutate at time t, whereas in the second one 
trinucleotides cannot mutate at the considered time. The trinucleotides that 
mutate at time t are randomly picked following a uniform distribution. Con- 
sequently, the size and the constitution of the subset of mutable trinucleotides 
change at each time t. This subset is denoted by J(t), and this new model has 
been called "chaotic model" by the authors of [8j [10] , as opposed to the former 
"standard model" of 1998. 

In the chaotic model, non-mutable trinucleotides cannot have been obtained 
by the mutation of other trinucleotides. Consequently, their probability of oc- 
currence is constant, so their derivation is null. Conversely, mutation parameters 
of the mutable trinucleotides are those of the model of 1998: p, q, and r with 
p + q + r = 1, for each of the three sites of nucleotides. 

The new model is thus defined by the following way: 

r m)=o niim 

I 64 

] P' l {t) = Y J {A^-I) 3l P 3 {t) if i€J(i) 

Obviously, this new model is a generalization of the one of 1998, as if we 
suppose that, Mt,A^ — A, and if Vt,J(t) is the set of all the trinucleotides, 
then the system above can be resumed to its second line, which is exactly the 
model of 1998. 

As the number of mutable trinucleotides changes over time, the mutation 
matrix is not constant, which leads to the fact that the resolution method used 
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in the standard model cannot be applied here. To solve the system, authors of 
[8J [10] have considered discrete times small enough to be sure that the mutation 
matrix do not change between two instants ti and £j+i. 

Let A( k * be the (constant) mutation matrix during the time interval [tfc— 1, i*]. 
To be able to compute P!(tk-i), authors of [HI HO] have used the Euler method, 
to obtain: 

d Pjfa-i) = Pj(t k ) -Pfa-i) 
dt h 

where h — tk ~ ifc-i is supposed small and constant. By putting this formula 
into the previous system, these authors have finally obtained: 

( Pi(t k ) = Pi(t fc _i) ifi^J(i), 

J 64 

i P(t fe ) = ^(A( fe ) - iyP^x) + Pi(t k -i) if i G J(t). 
I j=l 

This model has been called the "discrete time chaotic evolution model CM" 
in [H1QII]- This discrete version of the continuous chaotic one is, indeed, a gene 
evolution model that use chaotic iterations of Definition [TJ To understand the 
interest of this discrete time chaotic evolution model, we must firstly recall the 
discovery by Michel et al. of a C 3 — code and its properties. 

2.3 Relevance of the CM model 

A computation of the frequency of each trinucleotide in the 3 frames of genes, in 
a large gene population (protein coding region) of both eukariotes and prokary- 
otes, has established in 1996 that the distribution of trinucleotides in these 
frames is not uniform. Such a surprising result has led to the definition of 3 
subsets of trinucleotides, denoted by Xq, Xi, and X^. Xq, X\, and Xi are 
respectively constituted by 20 trinucleotides. They are linked by the following 
permutation property: X\ — {V(t),t G Xo}, X^ — {P(t),t £ Xi}, where for 
all trinucleotide t — nonin,2, V(t) = n^nifiQ. More details about the research 
context and the properties of these sets (C 3 code, rarity, largest window length, 
higher frequency of "misplaced" trinucleotides, flexibility) can be found in [8lll0|. 
Among other things, it has been proven that Xo occurs with the highest proba- 
bility (48.8%) in genes (reading frames 0), whereas X\ and X2 occur mainly in 
the frames 1 and 2, respectively. In other words, Xo is not pure (its probability 
is less than 1): it is mixed with X\ and X2 in genes. 

Such a property can be explained as follows: random mutations have intro- 
duced noise during evolution, leading to a decreased probability of Xq [51 UOj. 
Moreover, codes Xi and X2 are not symmetric in genes, i.e., P{X\) < P(Xa) 
(the probability difference is 4,8%). This is totally unexpected: the complemen- 
tarity property should lead to the same probabilities for X\ and X2 , even when 
considering noise during evolution. 

The standard and chaotic models (with particular strategies for the stop 
codons) can explain both the decreased probability of the code Xq and the 
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asymmetry between the codes X\ and X2 in genes. These standard and chaotic 
models construct "primitive" genes, i.e., genes before random substitutions, with 
trinucleotides of the circular code Xq. These models are able to find the fre- 
quency orders of the three codes Xq, X\, and X2 in genes. In particular, the 
chaotic model called "CMtaa" with low mutability of the stop codon TAA, 
matches the probability discrepancy between the circular codes X\ and X2 ob- 
served in real genes. Its ability to match is better than the standard model SM 
and the other chaotic models. 

In the following section, we will propose some reasons explaining why some 
chaotic models match with a good accuracy frequency orders of the three codes, 
when considering the circular code Xo as constituting the "primitive" genes. 
More precisely, we will show that some genes evolution mechanisms are chaotic 
according to Devaney, thus explaining why chaotic models fit such evolution. 

3 The CM model is a truly chaotic one 

First of all, let us recall that the term "chaotic", in the name of these iterations, 
has a priori no link with the mathematical theory of chaos, recalled below. 

3.1 Devaney 's chaotic dynamical systems 

Consider a topological space (X, r) and a continuous function / on X. 

Definition 2 / is said to be topologically transitive if, for any pair of open 
sets U, V C X, there exists k > such that f k {U) R V ^ 0. 

Definition 3 An element (a point) x is a periodic element (point) for f of 
period n £ IN*, if f n (x) = x. 

Definition 4 f is said to be regular on (X,t) if the set of periodic points for 
f is dense in X: for any point x in X, any neighborhood of x contains at least 
one periodic point. 

Definition 5 f is said to be chaotic on (X,t) if f is regular and topologically 
transitive. 

The chaos property is strongly linked to the notion of "sensitivity", defined 
on a metric space (X,d) by: 

Definition 6 / has sensitive dependence on initial conditions if there exists 
5 > such that, for any x £ X and any neighborhood V of x, there exists y G V 
and n ^ such that d (f n (x),f n (y)) > S. 
S is called the constant of sensitivity of f . 

Indeed, Banks et al. have proven in that when / is chaotic and (X,d) 
is a metric space, then / has the property of sensitive dependence on initial 
conditions (this property was formerly an element of the definition of chaos). 
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To sum up, quoting Devaney in [12J, a chaotic dynamical system "is unpre- 
dictable because of the sensitive dependence on initial conditions. It cannot be 
broken down or simplified into two subsystems which do not interact because 
of topological transitivity. And in the midst of this random behavior, we nev- 
ertheless have an element of regularity". Fundamentally different behaviors are 
consequently possible and occur in an unpredictable way. 

3.2 Chaotic iterations and Devaney 's chaos 

In this section we give outline proofs of the properties establishing the fact that 
the CM model is truly chaotic, as it is defined in the Devaney's theory. The 
complete theoretical framework is detailed in [?] . 

Denote by A the discrete Boolean metric, A.(x,y) — <^ x — y. Given a 
function /, define the function: F f : [1; N]xB N — > B N such that F f (k,E) = 

(E j .A(kJ)+f(E) k .MkJ)) 

Let us consider the phase space X = [1; N] w x B N and the map Gf (S, E) = 
(a{S),F f (i(S),E)), where a is defined by a : (S n ) ne -K G § — > {S n+1 ) neK G §, 
and i is the map i : (S n ) ne K G § — > S° G [1; NJ. So the chaotic iterations can 
be described by the following iterations: 

X° G X and X k+1 = G f (X k ). 

We have defined in [?] a new distance d between two points (S, E), (S, E) G X 
by d((S,E); (S,E)) = d e (E,E) + d a (S,S), where: 

N 

• d e ( J B,£)=^A( J B fe)J B fe )e[0;N], 

k=l 

• 4(5,5) = if;i^=^l€[0;l]. 

fc=i 

It is then proven that, 
Proposition 1 Gf is a continuous function on {X,d). 

In the metric space (X, d), the vectorial negation / : M N — > M N , (bx, ■ ■ ■ , & N ) 
(bi, ■ ■ ■ , &n) satisfies the three conditions for Devaney's chaos: regularity, tran- 
sitivity, and sensitivity [?]. So, 

Proposition 2 Gt Q is a chaotic map on {X,d) according to Devaney. 

Thus the model that gives the best results, in a certain way, to the problem 
of genes evolution prediction, is a chaotic model. We will give in the next 
sections a result concerning inversions that can possibly explain this fact, at 
least partially. 
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4 How to Formalize Inversions 



4.1 The inversion operator 

Let N = {A,T,C,G} be the set of nucleotides and N 6 IN*. A chromosome 
with N nucleotides is any element of Af N . Let C be the set of all chromosomes of 
size Af. For each C, C e C N , the chromosome C is said to be changeable into C 
if and only if there is a permutation mapping C into C . We denote it by C « C. 
In a mathematical point of view, « is a relation of equivalency. In a biological 
point of view, the class of equivalency C of C corresponds to all of the possible 
and conceivable reordering of the chromosome C over time. By reordering, we 
mean a simple change of the order of the nucleotides into C. 

We will focus on the evolution of a chromosome C° of C over time, when we 
suppose that intrachromosomic inversions can occur. These inversions have the 
form: 

(n , . . . , ni_i, ni 1 Ttt+i ■ ■ ■ , n^yn^ , rij+i , . . . , n N ) — > 

(n , . . . ,ni-i, nj,rij-i . . . ,n i+1 ,n,i, nj +1 , . . . ,n N ). 

The sequence of inversions corresponds to the sequence of segments ([a 1 ; 
where [a 1 ; b 1 } represents the nucleotides segment that mutates at time i: the nu- 
cleotides from n a i to n b i are inverted. 

Let <Sn = ([1; N] x [1; N]) W be the set of all the possible evolutions by in- 
version over time, C be a chromosome with N nucleotides, and X(C) = Cx5. 

We define the global inversion by: 

i: C — » C 

(ni,...,n N ) i — > (n N ,...,ni). 

In other words, for a chromosome C, i(C) is the chromosome in which the 
first nucleotide becomes the last one, etc. Let us notice that, as the DNA strain 
is always read in the 5' — > 3' direction, then i(C) ^ C. 

Let us now define the partial inversion function, as follows: 

/: Cx[l;N] 2 — > C 

((m,...,n N ),(a,6)) i — > K,...,n' N ), 

with: 

, _ / n k if k <£ [a; bj, 

k \ ( ( T N - a - b + 1 oj)(n 1 ,...,n N ) fe else, 

where a is the nucleotide circular shift: 

a: C — > C 

(ni,...,n N ) i — > (n 2 ,...,n N ,ni). 

So f(C, (a, b)) is the chromosome corresponding to the inversion of the seg- 
ment [a; 6] into the chromosome C. Furthermore, for each (i,j) £ [1,N] 2 , we 
define f {iJ) (C) = f (C,(i,j)). 
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Remark 1 / can be rewritten as 



/: Cx [1;N] 2 — ► _ [1; N] 2 

((m,...,n N ),(a,6)) i — >■ (n fc (l - X[ o;6 j (fc)) + (crN-"-^ 1 oi) (m, . . . ,n N )feX[ a; 6](A;)) fc=1 ^ 



w/iere Xx is i/ie indicator function of the set X: 

Tx{x) ~- 



l ifxex 

eke. 

The inversion operator 3 is finally defined, for a given family of equivalent 
chromosomes C having N nucleotides, by: 

3: X(C) — >• Af(C) 

((ni,...,n N );(S°,S 1 ,...)) — > (/ ((m, • • • , «n), 5°) ; (S\ S 2 , . . .)) , 

that is, 3(C,S) = (/(C, S°);E(S)), where E ((S")„ eIN ) = (S" +1 )„ e w 
4.2 A metric for chromosomes 

We can define a distance d on X (C) by: VA = (C A ,S A ), B = {C B ,S B ) eCxS N , 
d(A, B) = d c (C A ,C B ) + d s (S A , S B ), 

where: 

N 

• d c ((ni, . . . ,n N ); (ni, . . . ,n' N )) = ^^(n^n-), with 5(n,n') = if n = n', 

»=i 

and <5(n, n') = 1 else. 

N 



ds(S,S) = I £ UL S (giS^giS^) , with: 



i=i 

<7(a, 6) = (1, . . . , a — 1,6, 6 — 1, ... a + 1, a, 6 + 1, . . . , N). 

Proposition 3 d is a distance on X(C). 
Proof 1 We will show that d is the sum of two distances. 
1. Let us firstly demonstrate that dc is a distance: 

• d c (C,C) = VM(riiX) = 0. So, Vi,n 4 = n' t , and thus C = C . 

N N 

. d c (C, C) = Yjt{m,n$ = E 5 «' n ») = W> C). 

i=i »=i 
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• VC,C",C" G X(C),d c (C,C") < d c (C,C) + d c {C',C"). Indeed, 
Vi 6 [1; N], n") < 5(nj,nQ + £(n-,n"), because: 

— it is obvious if 8{ni,n'() = 0, 

— e/se ; 8(n i} n") = 1, which implies either rn ^ n- or n\ ^ n". ylnd 
so, either 8 {ni^n'^j = 1 or 8(n' i7 n") = 1. 

Le£ us now prove that ds is a distance too. Obviously, ds(S, S') = ds{S' , S) 
and drS, S) — 0. Finally, the triangle inequality of ds is inherited from the 
triangle inequality of 8. 

Proposition 4 The inversion operator 3 is a continuous function on (X (C) , d). 

Proof 2 Let (C k ,S k ) -»• (C,S). Then d c (C k ,C) -> and d s (S k , S) -> 0. 

• One tte one /land, as dc{C k ,C) — > and due to tte /ac£ that dc is 
an integer metric, we have: 3ko,k fco =^ C fc = C. Additionally, as 
d s (S k , S) -> 0, 3jfei g IN, fc > fci : d s (S* S) < 1CT 1 . 5o 

In other words, Vj G [1; N], g ((5 fe )°) = g (S°) . And thus (S k )° = S°. 
Finally, Vfc > maz(fco, fci), / (C fe , = / (C k , S°) => 

Km^oo/ (C fe , (S fe )°) = Km^o/ S°) . 

• On tte otter /land, 

Q 00 / i N / 

< d fl (E(S*), E(S)) = - £ — J2 8 I g ((ESf ) . ; g ((ES)') . 

i=1 V j=l x 7 j 

i=2 \ j=l v J J/ I 

i=i y j=i v J J/ y 

We can ttus conclude to the continuity of 3 on [X (C) , d). 

Let us now introduce two lemmas. Their proofs are obvious. 
Lemma 1 Any transposition (i,j) can be written as a composition o//j i+1;j _ 1 jo 
fli-J} ■ 

Lemma 2 Any permutation can be written as a composition of • 
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5 Chaos of DNA inversion 

Proposition 5 The inversion operator J is strongly transitive on (X (C) , d). 

Proof 3 Let A = (N A , S A ) and B = {N B , S B ) two points of X (C), and e > 0. 
We define N = N A , and Vfc fc = - [log w (e)} , S k = (S A ) k . Let N' = 
3 fco (7V A ,S A )i. There is a permutation that maps N' on N B , so there exists 
S' = {S[,S' 2 ,...,S' ki ) e p, Nf 1 such that 3 fel (iV', S")i = N B . 
Then the point: 

• N — N A , 

• Vk^ k ,S k = (S A ) k , 

• Vfc G [l,fci],S" £0+fe = S' k , 

• Vfc G M,S ka+kl+k+1 = (S B ) k . 

is e- close to A, and such that J k " +kl (N,S) = B. 

Proposition 6 The inversion operator 3 is regular on (X (C) , d). 

Proof 4 Let A = (N A ,S A ) G X (C) and e > 0. We define k = - [log w {s)] 
and N = 3 k "(A)i. A permutation can be found that maps N into N A , then 
there exists S = (S 1 , . . . , S kl ) such that 3 kl {N, S)i = A. 
Then the point (TV, 5*) defined by: 

• N — N A , 

• Vfc k Ql S k = (S A ) k , 

• Vfc G Il;fci],S" £0+fe = S k , 

• Vfc G m,S k<,+kl+k+1 = S k , 
is a periodic point e— close to A. 

As the inversion dynamic is both transitive and regular, we can thus conclude 
that, 

Theorem 1 DNA inversion is chaotic, as it is defined in the Devaney 's theory. 

6 Consequences 

7 Conclusion 
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